Skip to content

feat(seo): 全站 SEO 优化 — sitemap / JSON-LD / canonical / robots#289

Merged
longsizhuo merged 1 commit intomainfrom
feat/seo-optimization
Apr 16, 2026
Merged

feat(seo): 全站 SEO 优化 — sitemap / JSON-LD / canonical / robots#289
longsizhuo merged 1 commit intomainfrom
feat/seo-optimization

Conversation

@longsizhuo
Copy link
Copy Markdown
Member

Summary

新增结构化数据(JSON-LD)

  • 全局 WebSite + SearchAction — Google 搜索结果下方可能直接展示站内搜索框(Sitelinks Search Box)
  • docs 页 TechArticle + BreadcrumbList — 技术文章 rich result + 面包屑层级
  • /u/[username]Person — 个人档案 knowledge panel 候选,含 sameAs GitHub 链接

sitemap 扩容(从 ~300 → 312 条,加 rank + 贡献者 profile)

  • 新增 /rank 条目
  • 新增 /u/{githubId} — 枚举 leaderboard JSON 全部贡献者(非贡献者 profile 不入 sitemap,节省 crawl budget)

canonical + hreflang

  • docs [...slug]canonical 指向 slug 原路径;alternates.languages 声明 zh-CN / en-US / x-default
  • /u/[username]:canonical 用 githubId 数字路径,避免 github_<id> 和数字两种 URL 竞争 PageRank
  • /rank / /login / /settings 各加 canonical

robots 调整

  • nocache: true(反而抑制 rich snippet)
  • googleBotmax-image-preview=largemax-snippet=-1max-video-preview=-1 让 Google 自行决定摘要长度
  • /login/settingsindex=false(登录/偏好页无需收录)

per-page metadata

  • /rank 加 title / description / OG
  • /u/[username] OG 用用户 avatarUrl 覆盖全局 og/cover.png,Twitter card 同步
  • docs 页 OG 加 type=article + locale 跟随当前语言

Test plan

  • pnpm typecheck 通过
  • curl localhost:3010/sitemap.xml → 312 条,含 /rank 和 21 条 /u/*
  • curl /u/114939201 HTML 包含 Person JSON-LD
  • curl /docs/ai/llm-basics/pytorch HTML 包含 TechArticle + BreadcrumbList JSON-LD + canonical
  • Vercel Preview 部署后用 Rich Results Test 验证结构化数据
  • Google Search Console 重提 sitemap,监控 crawl stats

**新增结构化数据(JSON-LD):**
- 全局 WebSite + SearchAction(让 Google 搜索结果下方可能显示站内搜索框)
- docs 页 TechArticle + BreadcrumbList(技术文章 rich result + 面包屑层级)
- /u/[username] 页 Person(个人档案 knowledge panel 候选)

**sitemap 扩容(从仅首页+docs → 312 条):**
- 新增 /rank 条目(changeFreq=daily)
- 新增 /u/{githubId} 条目(枚举 leaderboard JSON 全部贡献者,非贡献者 profile 不入 sitemap 节省 crawl budget)

**canonical + hreflang:**
- docs [...slug] 页:canonical 指向 slug 原路径;alternates.languages 声明 zh-CN / en-US / x-default
- /u/[username]:canonical 用 githubId 数字路径,避免 github_<id> 和数字两种 URL 竞争 PageRank
- /rank、/login、/settings 各加 canonical

**robots 调整:**
- 删 nocache: true(反而抑制 rich snippet)
- googleBot 上放开 max-image-preview=large / max-snippet=-1 让 Google 自行决定摘要长度
- /login、/settings 设 index=false(登录/偏好页不需搜索引擎收录)

**per-page metadata:**
- /rank 加 title / description / OG
- /u/[username] OG 从全局 og/cover.png 覆盖为用户 avatarUrl
- docs 页 OG 加 type=article + locale 跟随
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
involutionhell-github-io Ready Ready Preview, Comment Apr 16, 2026 8:20pm
website-preview Ready Ready Preview, Comment Apr 16, 2026 8:20pm

Copilot AI review requested due to automatic review settings April 16, 2026 19:03
@longsizhuo longsizhuo merged commit 3845146 into main Apr 16, 2026
8 of 10 checks passed
@longsizhuo longsizhuo deleted the feat/seo-optimization branch April 16, 2026 19:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements site-wide SEO improvements for the Next.js app by expanding sitemap coverage, adding structured data (JSON-LD), and tightening canonical/robots metadata to reduce duplicate indexing and improve rich results eligibility.

Changes:

  • Add JSON-LD structured data for the site (WebSite + SearchAction), docs pages (TechArticle + BreadcrumbList), and user profiles (Person).
  • Expand sitemap.xml to include /rank and contributor profile pages (/u/<githubId>) sourced from the build-time leaderboard JSON.
  • Add/adjust per-page metadata (canonical, OG/Twitter, robots noindex) for /rank, /u/[username], /login, /settings, and docs pages.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
app/u/[username]/page.tsx Adds canonical/OG/Twitter for profiles and injects Person JSON-LD.
app/sitemap.ts Adds /rank and contributor /u/<id> entries to the sitemap.
app/settings/page.tsx Adds canonical + robots: noindex metadata for settings page.
app/rank/page.tsx Adds canonical + title/description + OG metadata for rank page.
app/login/page.tsx Adds canonical + robots: noindex metadata for login page.
app/layout.tsx Updates global robots directives and adds global WebSite + SearchAction JSON-LD.
app/docs/[...slug]/page.tsx Adds docs JSON-LD and sets canonical/hreflang + OG/Twitter metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/u/[username]/page.tsx
Comment on lines +374 to +376
...(user.githubId
? { sameAs: [`https://github.com/${user.githubId}`] }
: {}),
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sameAs is being set to https://github.com/${user.githubId} but githubId is a numeric GitHub user id; GitHub profile URLs use the login/username, so this link will be incorrect. Consider omitting sameAs unless you have the GitHub login (or using a verified GitHub profile URL from user preferences) to avoid emitting invalid structured data.

Suggested change
...(user.githubId
? { sameAs: [`https://github.com/${user.githubId}`] }
: {}),

Copilot uses AI. Check for mistakes.
Comment thread app/u/[username]/page.tsx
Comment on lines +383 to +389

return (
<>
<script
type="application/ld+json"
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{ __html: JSON.stringify(personJsonLd) }}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON-LD script is built from user-controlled fields (e.g. preferences.bio, user.username) and injected via dangerouslySetInnerHTML with a raw JSON.stringify(...). If any field contains </script> (or <), it can break out of the script tag and enable XSS. Escape unsafe characters in the serialized JSON-LD (commonly replacing < with \u003c, and also >/& as needed) before injecting.

Suggested change
return (
<>
<script
type="application/ld+json"
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{ __html: JSON.stringify(personJsonLd) }}
const personJsonLdString = JSON.stringify(personJsonLd)
.replace(/</g, "\\u003c")
.replace(/>/g, "\\u003e")
.replace(/&/g, "\\u0026")
.replace(/\u2028/g, "\\u2028")
.replace(/\u2029/g, "\\u2029");
return (
<>
<script
type="application/ld+json"
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{ __html: personJsonLdString }}

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +202
// 规范化 slug → canonical 路径。用户访问 /docs/ai/rl(原文)或 /docs/ai/rl.en(翻译版)
// 都统一指向原始 slug,避免两个 URL 竞争同一份内容的 PageRank。
const slugPath = (slug ?? []).join("/");
const canonical = slugPath ? `/docs/${slugPath}` : "/docs";

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonical is currently derived from the raw route params (slugPath). Because the docs tree includes language-suffixed routes like *.zh / *.en (e.g. 01-static-array.zh), this will set canonical to the language-suffixed URL, which contradicts the intent in the comment (and the PR description) to consolidate ranking onto the base slug. Consider normalizing the canonical by stripping a trailing .en/.zh from the last slug segment (and using that normalized slug consistently for alternates.canonical / OG url).

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +148
<script
type="application/ld+json"
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{ __html: JSON.stringify(articleJsonLd) }}
/>
<script
type="application/ld+json"
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{ __html: JSON.stringify(breadcrumbJsonLd) }}
/>
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON-LD blobs are injected with dangerouslySetInnerHTML and a raw JSON.stringify(...). If any doc frontmatter field ever contains </script> / < (titles/descriptions are user-editable content in this repo), it can break out of the script tag. Escape unsafe characters in the serialized JSON-LD (at least replace < with \u003c) before injecting.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants